1,331 research outputs found
Efficient algorithms for conditional independence inference
The topic of the paper is computer testing of (probabilistic) conditional independence (CI) implications by an algebraic method of structural imsets. The basic idea is to transform (sets of) CI statements into certain integral vectors and to verify by a computer the corresponding algebraic relation between the vectors, called the independence implication. We interpret the previous methods for computer testing of this implication from the point of view of polyhedral geometry. However, the main contribution of the paper is a new method, based on linear programming (LP). The new method overcomes the limitation of former methods to the number of involved variables. We recall/describe the theoretical basis for all four methods involved in our computational experiments, whose aim was to compare the efficiency of the algorithms. The experiments show that the LP method is clearly the fastest one. As an example of possible application of such algorithms we show that testing inclusion of Bayesian network structures or whether a CI statement is encoded in an acyclic directed graph can be done by the algebraic method
Accuracy bounds for ensembles under 0 - 1 loss.
This paper is an attempt to increase the understanding in the behavior of ensembles for discrete variables in a quantitative way. A set of tight upper and lower bounds for the accuracy of an ensemble is presented for wide classes of ensemble algorithms, including bagging and boosting. The ensemble accuracy is expressed in terms of the accuracies of the members of the ensemble.
Since those bounds represent best and worst case behavior only, we study typical behavior as well, and discuss its properties. A parameterised bound is presented which describes ensemble bahavior as a mixture of dependent base classifier and independent base classifier areas. Some empirical results are presented to support our conclusions
Recherches biostratigraphiques dans quelques coupes du Famennien de l'Avesnois (Nord de la France)
Conodonts and Goniatites from four "old" famennian sections in the Avesnois (France) have been carefully studies. For the first time, the biostratigraphic position of these sections is determined
Stratigraphic interpretation of the Tohogne borehole (province de Luxembourg). Devonian - Carbinoferous transition
The Tohogne borehole section, from the Lower Tournaisian into the Upper Famennian, has a remarkable micropalaeontological content (conodonts, foraminifers, spores) which enabled a detailed subdivision of these strata. New data in biostratigraphy and systematic palaeontology and palaeogeographic implications are presented, as well as correlations with reference sections
Phylogeographic analysis of the Bantu language expansion supports a rainforest route
The Bantu expansion transformed the linguistic, economic, and cultural composition of sub-Saharan Africa. However, the exact dates and routes taken by the ancestors of the speakers of the more than 500 current Bantu languages remain uncertain. Here, we use the recently developed “break-away” geographical diffusion model, specially designed for modeling migrations, with “augmented” geographic information, to reconstruct the Bantu language family expansion. This Bayesian phylogeographic approach with augmented geographical data provides a powerful way of linking linguistic, archaeological, and genetic data to test hypotheses about large language family expansions. We compare four hypotheses: an early major split north of the rainforest; a migration through the Sangha River Interval corridor around 2,500 BP; a coastal migration around 4,000 BP; and a migration through the rainforest before the corridor opening, at 4,000 BP. Our results produce a topology and timeline for the Bantu language family, which supports the hypothesis of an expansion through Central African tropical forests at 4,420 BP (4,040 to 5,000 95% highest posterior density interval), well before the Sangha River Interval was open
Efficient estimation of AUC in a sliding window
In many applications, monitoring area under the ROC curve (AUC) in a sliding
window over a data stream is a natural way of detecting changes in the system.
The drawback is that computing AUC in a sliding window is expensive, especially
if the window size is large and the data flow is significant.
In this paper we propose a scheme for maintaining an approximate AUC in a
sliding window of length . More specifically, we propose an algorithm that,
given , estimates AUC within , and can maintain this
estimate in time, per update, as the window slides.
This provides a speed-up over the exact computation of AUC, which requires
time, per update. The speed-up becomes more significant as the size of
the window increases. Our estimate is based on grouping the data points
together, and using these groups to calculate AUC. The grouping is designed
carefully such that () the groups are small enough, so that the error stays
small, () the number of groups is small, so that enumerating them is not
expensive, and () the definition is flexible enough so that we can
maintain the groups efficiently.
Our experimental evaluation demonstrates that the average approximation error
in practice is much smaller than the approximation guarantee ,
and that we can achieve significant speed-ups with only a modest sacrifice in
accuracy
- …